Topic Extraction of Crawled Documents Collection using Correlated Topic Model in Mapreduce Framework

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Correlated Topic Model Using Word Embeddings

Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, ...

متن کامل

Diverse Topic Phrase Extraction from Text Collection

Keyword extraction is an efficient approach to managing an explosion of online text on the Web. Traditionally, an abstraction of the online text is constructed though keywords, which are extracted according to a certain importance measure. One such measure is their occurrence frequency. However, previous work has not considered another important factor: the diversity of the keywords. Therefore,...

متن کامل

Topic Extraction from Text Documents Using Multiple-Cause Networks

This paper presents an approach to the topic extraction from text documents using probabilistic graphical models. Multiple-cause networks with latent variables are used and the Helmholtz machines are utilized to ease the learning and inference. The learning in this model is conducted in a purely data-driven way and does not require prespecified categories of the given documents. Topic words ext...

متن کامل

Correlated Tag Learning in Topic Model

It is natural to expect that the documents in a corpus will be correlated, and these correlations are reflected by not only the words but also the observed tags in each document. Most previous works model this type of corpus, which are called the semi-structured corpus, without considering the correlations among the tags. In this work, we develop a Correlated Tag Learning (CTL) model for semi-s...

متن کامل

A correlated topic model of Science

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document ab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Natural Language Computing

سال: 2019

ISSN: 2319-4111

DOI: 10.5121/ijnlc.2019.8602